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We consider a channel coding for sending classical information through a quantum channel with 
a given ensemble of quantum states (letter states). As well known, it is generically possible in a 
quantum channel that the transmittable information in block coding of length n can exceed n times 
the maximum amount that can be sent without any coding scheme. This so-called superadditivity 
in classical capacity of a quantum channel is a distinct feature that can not be found in classical 
memoryless channel. In this paper, a practical model of channel coding that shows this property is 
| presented. It consists of a simple codeword selection and the optimum decoding of the codewords 

0^ ■ minimizing the average error probability. At first, optimization of decoding strategy is discussed. 

Then the channel coding that shows the superadditivity in classical capacity is demonstrated. 

^ . PACS numbers:03.65.Bz, 89.70,+c, 42.79.Sz, 89.80.+h, 32.80.-t 
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I. INTRODUCTION 



Theory of quantum communication was initiated more than thirty years ago, in order to consider quantum nature 
of signal carrier in optical frequency domain. In this region, one faces quite different features from RF band commu- 
I ' nication, due to quantum noise of signal carrier itself. This theory was then developed in 1970's revealing new aspects 
of information transmission and signal detection. It now attracts much attention since new fields such as quantum 
OO 1 computation and cryptography emerged. Being assisted by ideas and methods in these fields, significant progress was 
0^ \ made in a basic and old issue on channel capacity. In particular, the theorem was established that the attainable 
maximum rate of asymptotically error free transmission for sending classical information by using a given source of 
O |' quantum states (letter states) is precisely the Holevo bound (let us call it the quantum channel coding (QCC) 
theorem) |^,^,^,^|. This rate is the asymptotic rate at infinite block length , n — > oo, and is especially called classical 
capacity of quantum channel. The term capacity of quantum channel is now used in various contexts of quantum 
information theory, including not only transmission of fixed classical alphabet but also sending intact quantum states. 
In this paper, we confine ourselves to transmission of classical information by use of a given letter-state ensemble, and 
hereafter the term capacity is understood as classical capacity for this case. 

The QCC theorem guarantees existence of codes that have the above asymptotic property, but does not tell directly 
how to construct such codes from given letter states. For practical applications, simple and systematic coding- 
decoding methods at finite block length is required. Such methods will immediately be applied to, for example, 
advanced schemes of satellite communication and ultra-fast optical fiber communication. In these cases, signal power 
at receiving end might be very weak due to long distance transmission or limited power supply so that a main source 
to cause error will be nonorthogonality among letter states, which is just the situation covered by the above theorem. 

The purpose of this paper is to give some insights into practical aspects of quantum channel coding. In quantum 
channel coding, block sequences are made as direct product states of the letter states, some of them are selected and 
transmitted as codeword states, and they are then detected quantum mechanically. A quantum channel made of the 
codeword states of length n is called a n-product channel. There are two essential ingredients in using this n-product 
channel; one is the suitable selection of codeword states from all the possible sequences made of the letter states, and 
the other is a collective decoding that detects each codeword state as a single state- vector rather than decoding the 
individual letter states separately. Especially, the latter fully utilizes superposition states of the codeword states, and 
brings an inseparable structure among the letter states, which is often called entanglement. This remarkable feature 
cannot be found in classical channel coding. As a consequence, the n-product channel can have a memory effect in 
the sense that the channel matrix cannot be factorized into the channel matrices corresponding to each letter, i.e., 

n 

P{yiU2 • • • yn\x\X2 ■ ■ ■ x n ) 7^ J^J P(yi\xi). This is even so if neither a source system emitting letter states nor a physical 

»=i 

process of transmission has memory effect. This effect can be used to increase a reliability of information transmission. 
In fact, the codeword states are selected suitably, this effect makes it possible that more classical information can 
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be sent through the n-product channel than n-times the amount that can be sent through a single use of the initial 
channel made merely of the letter states without any coding scheme. This so-called superadditivity in capacity is a 
generic nature of a quantum channel, and is indeed information theoretic quantum gain 0J^] . So the first step toward 
finding the ultimate channel coding for the Holevo bound might be to construct codes that attain this quantum gain. 

In this paper, quantum channels that shows the superadditivity in capacity is described. We first consider opti- 
mization of decoding. The collective decoding used in the proof of the QCC theorem was the so-called square-root 
measurement ||. This allows one to derive an explicit decoding observable systematically from given codeword states. 
In addition, this has been known to be almost optimum when the quantum states to be distinguished are equally 
likely and almost orthogonal @,||||, which is the case for the typical sequences obtained at very long block length. 
Therefore, it played a sufficient role to evaluate the upper bound of the decoding error. But this measurement is 
actually more than that. In Sec. ||, it will be pointed out that the square-root measurement becomes precisely 
optimum in terms of the average error probability in certain cases of pure and linearly independent quantum states 
that are even neither equiprobable nor almost orthogonal. The optimality of decoding strategy should be pursued in 
order to achieve performance as high as possible, especially in a practical channel coding of finite block length. When 
the square-root measurement is not optimum, there is a method to construct the optimum one by modifying it. For 
a practical purpose, we pr esen t a basic scheme of the optimum collective decoding of codeword states in the case of 



pure-state channel in Sec. Ill, As for physical realizations of this scheme, the readers are referred to the subsequent 
paper. 

In order to quantify the superadditivity in capacity, the attainable maximum mutual information without any 
coding must be known. This quantity is usually denoted as C\ . The optimum solutions of the prior probabilities and 
the decoding observable that maximize the mutual information have been known only in a few cases |l^,[ll],[l2| . In this 
paper, the most basic case of binary and pure letter states is considered. In Sec. IV, we define a threshold point where 
all the sequences are used as the codeword states and the accessible informationfthe maximum mutual information 
attained by optimizing the decoding observable with prior probabilities fixed) at block length n is exactly nC\. At 
this point, the optimum decoding is not like a collective fashion but rather reduces to the separate measurement 
which detects each letter state individually, that is, there is no room for generating entanglement correlation among 
letter states. This threshold point will be a useful guide for quantitative discussions. Sec. [V] is devoted to concluding 
remarks. 



II. DISTINGUISHING LINEARLY-INDEPENDENT QUANTUM STATES 

To begin with, we shall describe the conditions for optimality in a general decision problem of M-ary quantum 
states. An ensemble of quantum states {pi} is given with respective prior probabilities Decision process of these 
signal states {pi} can be described by a probability operator measure (POM) {IT} satisfying the resolution of the 
identity / jftj = The POM effecting the decision needs only M components fti, LT2, • • •, II m which are usually 

i 

called detection operators. What we are seeking here are the optimum detection operators minimizing the average 
error probability. Defining the risk operators Wi = £i/3j and the Lagrange operator T = VjWiLTj, the optimum 

i 

conditions are written as |7|,[l3||, 

i) iiiim - w^iij = 0, v(i,j), 

ii) f - Wi > 0, Vi. 

The minimum average error probability is given by 

Pe(opt) = 1 - Trf . (1) 

When the signal states are pure (pi — \pi)(pi\ ) and linearly independent, the optimum detection operators can 
be given as the projection- valued measure (PVM) with rank 1 as IT = \uji)(uji\. The set forms a complete 

orthonormal set in the Hilbert space 7i s spanned by the signal states and each of them is called a measurement 

state. Introducing a matrix X = (Xij) = ((LUi\pj)), the above conditions are rewritten as 

i') t i x H x? i = t j x ij x; j , v(»,i), 

ii') tw = &x u x* t - ux im x;j > 0, 
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m = M. 



In general, it is a complicated job to derive explicit expressions for the optimum measurement states satisfying the 
above conditions. Only in the certain cases they have been known Otherwise, one has to rely on numerical 

simulations like the Bayes-cost- reduction algorithm (lfj. Most tedious part in such a method is to check the second 
condition if). But when the signal states are linearly independent, this is ensured more simply, if 

if") Y' = foXuXji) > 

is satisfied. Its proof was given in Appendix of Ref. [l5| ]. 

Now let us consider when the square-root measurement becomes optimum. The square-root measurement is defined 
as follows, 

\»i) = P~^\?i), (2a) 

M 

P = X>X&I' (2b) 
\h) = \fli\Pi)- (2c) 



As well known, the conditional probability based on this measurement P{j\i) = \(p-j\pi}\ 2 can be calculated by the 
following way. First make the Gram matrix T = {(pi\pj))- Second diagonalize it as, 

9i 

r = Q| I () (>!) 

where Q is a unitary matrix. Third and finally, calculate 

V9 1 \ 




Vt = Q 



(4) 



fg~M J 



Then the (i, j)-components of VT is just {^i\pj) — (pi\pj)V^j- Here we give a useful theorem in considering the 
optimum collective decoding. 
Theorem 1 

If {|pi)} are linearly independent, the measurement by becomes optimum when all of the diagonal components 

of vT 7 are equal. 
Proof 

Define Y = (Yjj) = ((fiilpj))- The measurement by {l/^i)} is optimum if 

i') t i Y ii Y£=Z j Y ij Y J * j , V(i,j), 
if) T' = (ZiYuY't) > 0, 

are satisfied. Denoting VT = (Yj), they can be rewritten as, 

f) V(i,j), 

if) T' = (YuY*) > 0. 

Since T is nonnegative and Hermitian, so is %/T- Therefore the above conditions reduce to 
i')Y u = Y Jj , 
if) T' = (YiYij) > 0. 
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Under the first condition, the second one further reduces to YnY > 0. This is automatically satisfied since VT = 
Y > 0. (Y > implies Ya > 0, Vi). Thus for the square-root measurement, the first condition just above is enough 
for the optimum condition and this means the theorem. □ 
Thus the square-root measurement plays a practical role not only in the case of equally probable and almost 
orthogonal states but also the case that the signal states satisfy the above condition. The related discussion was given 
by Ban et. al. in the case of equally probable and symmetric states Jl6). Even if the signal states do not satisfy the 
above condition, can be good initial states in searching the optimum measurement states. At first, note that 

the following remark. 
Remark 

If {l/Oj)} are linearly independent, are orthonormal. 

Proof 

The optimum measurement states is a complete orthonormal set in TL S . Define X = X^ \tOi) (ujj | so that 



\pi) = X\uji). Because of the linearly independence of X is nonsingular and X 1 exists. Then 

(Milw) = (p»|p -1 |Pj) 



^l.{u 3i \X^- l X\ Uj )y/l j 




= n/I^I |^6cK)(wfc|J 

□ 

Thus, for linearly independent states, the set i s always a complete orthogonal set in Ti s . So it can be connected 

via a unitary operator U in TL S with the optimum measurement states as \uji) = V\pi). Such an operator can 

be constructed, for example, as a series of 2-dim rotations by applying the Bayes-cost-reduction algorithm [jl5|| . This 
algorithm consists of steps of solving a binary decision problem of a chosen pair of signal states \pj}} on the 

plane spanned by the corresponding pair of basis vectors {\pi}, \p-j}}- At every step, two basis vectors are revised, and 
the average error would decrease or, at worst, remain the same. These 2-dim rotations are continued till reaching the 
optimum point where the previous conditions i') and ii') are satisfied. The resulting series of their products is just 
the required unitary operator V. 



III. OPTIMUM COLLECTIVE DECODING OF CODEWORDS 



Deriving the analytic expression of the optimum measurement basis vectors is a difficult job, but these basis 

vectors can be constructed somehow as explained in the previous section, and at the same time the channel matrix 
can be obtained. Thus only for evaluating performance, it is sufficient to derive these From a practice point of 

view, however, the basis vectors hardly imply a corresponding physical process. Although the set forms 

a standard von Neumann measurement, its physical implementation usually remains a nontrivial problem. In this 
section, we present a useful scheme for realizing the optimum collective decoding. In the case that the letter states 
are binary, this scheme naturally leads to an implementation based on a quantum circuit and a well-defined physical 
measurement. 

Let binary letter states be {|+), |— )} whose state overlap k = (+|— ) is assumed to be real and to lie in < ft < 1. 
By n-th extension, we pick up M-ary codeword states {|<Si), • • • , |£m)} (M < 2") from the 2" possible sequences 
of length n, and use them with respective input probabilities • • • , Cm}- The rest of sequences are denoted as 
{\Sm+i), ■ ■ ■ , |52™)}- Since the codeword states are linearly independent, they span the M-dim Hilbert space Ti. s - 
Let the n-th extended Hilbert space be Tif n . The optimum collective decoding is described by the orthonormal 
basis vectors • • ■ , |wm)} derived in such a way as mentioned in the previous section. An orthonormal set 

{\u>i), - ■ ■ , |cg>2»)} in the extended space 7if n can be made by adding the other basis vectors obtained by using the 
Schmidt orthogonalization, 
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\Sj) - \uJk)(uJk vert St) 
= , *=* — (i = M + !,'•• , 2"). 



(5) 



\ k=l 



We denote the expansion of all the sequences by the above basis vectors as, 



/ 


\Si) \ 




( kl> \ 






- 




\ 


|S 2 "> j 




V M ) 



(6a) 



B = {B ij ) = ((u s \S i )). 

Making two orthonormal basis vectors {\a}, \b}} from {|+), |— )}, we introduce the 2 n product basis vectors, 

= | a) <g> | a) <g> ••• <g> | a) <g> |a), 
|A 2 ) = |a) <g ja) <g • • • <g ja) ® |&>, 

|A 2 »_i) = |6> (S» |6> <S> — <S> \b)®\a), 
\A 2 n) = j 6) <g j 6) <g • • • <g> j 6) (g |6), 

and denote another expansion by them as, 



(6b) 



(7) 




= C 




(8a) 



C = (Qj) = ({A^Si)). 
The two basis sets are connected via a unitary operator U on H® n as, 

where 

2" 



#t =^ Uji |A i )(A i |, ttj-j = (B^C), 



(8b) 



(9a) 



(9b) 



Here the optimum collective decoding can be described by the set {U^A^, • • • , U^Am)} The minimum error proba- 
bility is obtained as 



M 



P e (0 P t) = 1 - J2 (m\(S m \&\A m }\ 



(10) 



This clearly means that the optimum collective decoding {|cj to )} can be effected by (i) transforming the codeword 
states {IS'm)} by the unitary transformation U, and (ii) applying the von Neumann measurement ||A m )(yl m |} into the 
transformed codeword states. This type of detection scheme is called the received quantum state control p^Jl8] | . The 
final measurement is actually a separate measurement distinguishing each output letter state as \a) or \b) sequentially. 

Now the measurement basis vectors need not to be the above combination but may be chosen as any combi- 
nation of M distinct elements of product basis vectors {{A^), - ■ ■ ,\Ai M )}. Depending on the choice, the matrix 
C should be redefined by rearranging the order of elements of the vectors of the right-hand side in Eq. (pa|), as 
{l-Ajj), • • • , |Aj M ), \A iM+1 ), ■ ■ ■ , \Ai 2 „)} . Then the unitary operator U constructed by Eq. (||) transforms the code- 
word states adaptively to the chosen basis vectors {|A m )} such that the minimum average error probability is attained 
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by the separate measurement. Note that U acts on the 2™-dim Hilbert space Tif n rather than the M-dim space Tt s . 
After the unitary transformation has been carried out, the resulting sequences at the final measurement always lie in 
the space spanned by {l-A^), • • • , |^4j M )}- If the transformation is skipped, all the product basis vectors will come out. 
The channel model of this scheme is illustrated in the case of n = 3 and M = 4. 

This kind of decomposition makes it easier to design the collective decoding systematically. As the final measurement 
on each letter state system, each process of which is described by the set {|a), \ b)}, the most suitable and implementable 
method may be chosen. Main problem is the realization of the unitary transformation as an adaptor to the final 
measurement. Corresponding physical processes are sometimes subtle. Difficulty of finding them may be case by case 
depending on what kind of letter-state system is provided. However if a 2-bit gate acting on qubits made of the two 
basis vectors {|ci), \ b)} is available, the required unitary transformation can be, in principle, effected as a quantum 
circuit used in quantum computation. 

Barenco. et. al. already showed that an exact simulation of any discrete unitary operator can be carried out by 
using a quantum computing network p9| . This story can be directly translated into the real operation of U on the 
codeword states. At first, U is decomposed into i7(2)-operators Tj^j by applying the algorithm proposed by Reck 
and others J2(| as, 

U = 5[2,1]2]3,1] ' ' ' T , [2",2"-2]7 1 [2" : 2"-l]7 (H a ) 

where 

f M = expHfcfl^X^-l - l^-XA-l)]. (lib) 

Then the above 2-dim rotations Tu^ 's are converted into quantum circuits by using the formula established by 
Barenco et. al. |is[| . Here the following point should be noted. Since qubits are letter states themselves constituting 
the codeword states, the gates should consist of the single physical species from which the letter states are made. Such 
gates known so far are Sleator and Weinfurter's gate consisting of two-state atoms E[ and the quantum phase gate 
acting on two photon-polarization states p3|. In the subsequent paper, examples of the required quantum circuits 
are described based on such gates. 

IV. SUPERADDITIVITY IN CAPACITY OF QUANTUM CHANNEL 

The purpose of this section is to demonstrate simple codes that show the superadditivity in capacity. Let us 
introduce some definitions. Prepare an ensemble of letter states s = {si, • • • , s/} in a Hilbert space 7i s . They 
represent input letters {1, • • • , /}. Let £ = {£x, • • • be corresponding prior probabilities. A decoding process is 
described by a probability operator measure (POM) on TC S , ir = {tti, ■ • • ,7Tj/} representing output letters {1, • • • , I'}. 
We call the mapping {1, • • • , 1} t— > {1, •••,/'} the initial quantum channel. Fixing s and 7r, the mutual information 
is defined as 

h (£,§:*) = £ £ P(M log , P(j|t) , (12) 

fc=i 

where p(j\i) = Tr(jTjSi) is a conditional probability that the letter j is chosen when the letter i has been sent. The 
maximum value of this quantity optimized with respect to £ and tt is usually denoted as C± , 

Ci(S)=supJi(C,8:7r). (13) 

A basic channel coding consists of (i) concatenation of the letter states into the l n block sequences {s^ (8 • • • ® Si n }, 
(ii) pruning of them into M-ary codewords S = {Si, • • • , Sm} which can encode log 2 M classical bits, and (iii) finding 
an appropriate decoding POM II = {IT, • • • , IIm'} in the extended Hilbert space 7if n . The obtained channel is called 
the n-th extended channel. Assigning input distribution £ = {d, • • • , Cm} to the codewords, the mutual information 
is defined also for this channel as, 

m m' pm 
j n (g, S:n) = X]CiE p M) lo g } — > ( 14 ) 

k=M 



G 



where P(j\i) = TrQIjSi). Let us define the n-th order capacity as, 



C„(8) = sup 7 n (C,S:n). (15) 

Then generally, C n (s) > nCi(s) holds for a quantum channel Jj],||]. This property is the superadditivity in capacity. 
One can define the limit C(s) = lim C„(s)/n. The quantum channel coding theorem Pfiplpl says that this C(s) is 

n — >oo — 

just the attainable rate of asymptotically error free transmission, hence the intrinsic capacity of the initial quantum 
channel, and is exactly equal to the Holevo bound, 



C(s) = sup 



(16) 



where H(§i) = — Tr(s^ logs^) is the von Neumann entropy of the density operator Sj. This theorem ensures that there 
exist such codes that the decoding error vanishes asymptotically as n — > oo if the transmission rate R = — log 2 M is 
kept below C(s). 

A remaining big problem is to find such codes. For this purpose, the first thing to be understood is the super- 
additivity in capacity. It should be stressed that the strict superadditivity C„(s) > nCi(s) is definitely impossible 
in classical memoryless channel. In contrast, a quantum channel has a memory effect seen in channel matrix as 

n 

P{yiV2 • • • yn\x\X2 ■ ■ ■ x n ) 7^ J^J P(yi\xi), even when the source of letter states and the physical transmission channel 

i=l 

do not have any memory effects. This memory effect is caused by the decoding process itself. That is, when a collective 
decoding is applied to the codewords, the entanglement structure among the letter states prevents, in general, the 
channel matrix from being factorized as the above. For attaining the strict superadditivity, an appropriate memory 
effect need to be generated by an appropriate selection of codewords and a collective decoding for them. This property 
is thus indeed quantum gain in information transmission. 

The essential role of entanglement for the information theoretic quantum gain can be stressed rigorously by the 
following theorem. 
Theorem 2 

Suppose that two ensembles of letter states s' 1 ' = {sj 1 ^} in H« and s^ 2 ) = {s^} in H.^ are given, and the first 
order capacities Ci(s^) and Ci(s^) are attained for the prior probabilities £^ and and the detection operators 
7T and -k^ 2 \ respectively. Then the accessible information of the channel with the inputs ® s^ 2 ) and the prior 
probabilities ® £^ is given as 

sup/(£« ® £< 2 >,S« ® s< 2 ) : ft) = C^) + d(s( 2 )), (17) 
n 



when 



n = <8> tt (2) . 



The proof is given in Appendix A |22j ] . 

Now suppose that the supremum in Eq. (|l^) is attained when £ = and 7r = 7r*. If all of the l n sequences 
{§i 1 ® ••• ® §i n } are used as the codewords with fixed prior probabilities {£,* x ■ • • x according to the above 

theorem, the optimum decoding ft maximizing the mutual information I n {£*® n , s® n , ft) is 

Hii-t. =< (18) 
The accessible information is simply n times Ci (s) , 

supJ„(r® n ,§®",ft)=nC 1 (s). (19) 
n 

Thus in this restricted case, there is no room for entanglement to be generated. So this case provides a threshold point 
for the information theoretic quantum gain. Once the input probabilities are redistributed so as to reduce weights 
of some codewords, the entanglement becomes possible, and the quantum gain can be obtained by an appropriate 
decoding II. Concerning to the threshold point, it might be worth mentioning a similar theorem in terms of the 
average error probability. 
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Theorem 3 

For given states with prior probabilities let {fti} be the optimum POM minimizing the average error 

probability. Then in distinguishing the product states {s^ ® •• • ® s,* n } associated with the prior probabilities {f^ x 
• • • x the optimum POM minimizing the average error probability is, 



ZZi- 



and the minimum average error probability is given as, 



P e (opt) = l-(Tn>)", 



(20) 



(21) 



where v = £,7^ is a Lagrange operator. 



Its proof is given in Appendix B fl23|] . It should be noted that the optimum POM for the letter states {-Hi} need 
not, in general, to coincide with the one for the mutual information, {tt*} in Eq. (|l8|), even for the same ensemble 
{§i}. The result of this theorem will be discussed later with the example of the channel coding. 

The simple example of a channel coding showing the superadditivity in capacity was already given by the authors 
in the case of the third extension of the binary pure-state channel f24(| . Here we generalize this example into n-th 
order extension, and demonstrate the relation 



sup/„(C,s®",ri) >nd(s), 



(22) 



n 



which ensures the strict superadditivity. The initial channel is made of binary letters s = {|+), |— )} whose inner 
product k = (+|— ) is assumed to be real. It is well known that the first order capacity is achieved by the symmetric 
channel with the detection operators {wi, 7C2} which minimizes the average error probability [ ^0|Jll]Jl2]| . These operators 
are given as iti = \u>i)(wi\ with 



«i = 



\UJ 2 ) 



1-K 2 



P 



1~K 2 



l-K 2 



(23a) 
(23b) 



where p = (1 — yl — k 2 )/2 is the minimum average error probability. The first order capacity is given simply as, 

d (S) = 1 + (1 - p) log 2 (1 -p)+p log 2 p. (24) 

In the n-th order extension, half of all the 2™ sequences are used as the codeword states and are input to the channel 
with equal prior probabilities. Such 2™ _1 codeword states are generated in a recursive manner from the four codeword 

states {| + ++), I H ), | h), | — I — )} in the third order extension, where | H ) = |+) <X> |— ) ® |— ) , etc. That 

is, defining vectors consisting of bra-state vectors of codeword states; 
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(3) = 




+ • 



(25) 



they are given as, 



7 



(n) 



( W \ 

(S*\ 



7 



(n-l) 



(26) 



This codeword selection can be specified by the notation [[n, n — 1, 2]] according to the nomenclature of coding theory. 
These codewords are decoded by the square-root measurement. The measurement basis vectors are defined as, 



(27) 
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where the prior probabilities are not included in the density matrix p unlike Eq. (g), simply for a mathematical 
convenience. As it will become clear soon, these basis vectors effect the optimum collective decoding for the above 
codewords. We have to evaluate the channel matrix P(j\i) = \ (pj\Si}\ 2 . The Gram matrix of the codewords is given 
by 



r (n) =r>v ><y 



(n) . ..(n)t 



r(™- 1 ) ^a^™" 1 ) 

K 2 A (n-l) p(n-l) 



where 



and 



A (n-i) = l . A («-i)t = 



r (n-2) A (n-2) 
A (n-2) p(n-2) 



r (3) 



1 


K 2 


K 2 


K 2 


K 2 


1 


K 2 


K 2 


K 2 




1 


K 2 


H 2 


K 2 


K 2 


1 


K 2 


1 


f 


1 


1 


K 2 


f 


1 


1 


1 


K 2 


1 


f 


1 


1 


K 2 



A (3) 



r'"' and A'™) can be diagonalized by a 2" 1 x 2" 1 matrix 

QW = H 2 , 1 -i/\/2^ rT 
where H 2 n-i is the Hadamard matrix defined by 



H 



2 k 



Hfe Hfe 
Hfe —Hfe 



H, 



1 1 

1 -1 



The diagonalized matrices, 

G (n) = q(n)t r (»)qW ] 

p(") = Q(")tA( n )Q("), 

can be decomposed into 2"~ 2 x 2™~ 2 matrices G^™ -1 ) and F^ 1 ) as, 



Q(n-l) + K 2p(n-1) q 



(28a) 



(28b) 



(28c) 



(28d) 



(29a) 



(29b) 



(30a) 
(30b) 



(31a) 



p(") 



Q(ra-l) _|_ p(n-l) 

o g(" _1 ) - f^™ -1 ) 



After recursive decompositions, they can be represented as, 

/ A(n,l) 



V 



A(n,2"- 3 ) / 



(31b) 



(32a) 



B(n,l) 



p(«) — 



B(n,2"- J ) / 



(32b) 
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where A(n, fc) and B(n, fc) (fc = 1, ■ 



■ , 2™ 3 ) are 4x4 block matrices denned by, 

A(n, fc) = a(n, fc)G (3) + 6(n, fc)F (3) , 
B(n, fc) = c(n, fc)G (3) + d(n, fc)F( 3 \ 



(33a) 
(33b) 



with 



1 + 3k 2 



G( 3 > = 



(34a) 



F< 3 ) 



/ 3 + k 2 

V 



-1 



-1 



(34b) 



-l + n 2 / 

The coefficients in Eq. (B3h are determined by the following recursive formula for k = 1, ■ 



a(n, k) 




— a(n — 


l.k) 


+ K 2 c(n 


-l,k), 


a(n, 2 n - 4 - 


f fc) 


— a(n — 


l,fc) 


— n 2 c(n 


-l,fc), 


6(ro, fc) 




= b(n — 


l,fc) 


4- n 2 d(n 


-l,fc), 


b(n, 2 n - 4 H 


-fc) 


= b(n- 


l,fc) 


- n 2 d(n 


-l,k), 


c(n, fc) 




= a(n — 


l,fc) 


+ c(n — 


l,fc), 


c(n, 2"- 4 - 


rfc) 


= a(n — 


l,fc) 


— c(n — 


l,fc), 


d(n, fc) 




= b(n — 


l,fc) 


4- d(n - 


l,fc), 


d[n, 2"- 4 - 


\-k) 


= b(n — 


l,fc) 


— d(n — 


l,fc), 



(35a) 



with the initial values, 



o(4,l) 
6(4,1) 
c(4,l) 



o(4,2) = 1, 
-6(4,2)=, 



c(4,2) = d(4,l) = -d(4,2) = l. 
Thus the diagonal matrices A(n, fc)'s are obtained, and the square-root of the Gram matrix is given as, 



(35b) 



For representing the result, let us define, 



y/Mn,l) 



Q 



(«)t 



y/A(n,2 n - 3 ) 



a(n, fc) = + 3K 2 )a(n, fc) + (3 + K 2 )b(n, fc), 
/3(n, fc) ee y/(l- K 2 )[a(n,k)-b(n,k)}, 



and 



where 



D(n, fc) ee Q^^A(n,k)Q^ 

/z(n, fc) v(n, fc) z^(n, fc) v(n, fc) 

u(n,k) /j,(n,k) v(n,k) v(n,k) 

v{n,k) v(n,k) /i(n, fc) v(n,k) 

v{n,k) v(n,k) u(n,k) /i(n, fc) 



fi(n, fc) = ~[a(ra, fc) + 3/3(n, fc)], 
i/(n, fc) ee ~[a(n, fc) - /3(ra, fc)]. 



(36) 

(37a) 
(37b) 

(37c) 
(37d) 

(37e) 
(37f) 
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Then Vr(") can be represented as, 



/R(n,l) 
R(n,2) 
R(n, 3) 
R(n,4) 



R(n,2) 
R(n,l) 
R(n,4) 
R(n,3) 



R(n,3) 
R(n,4) 
R(n,l) 
R(n,2) 



R(n,4) 
R(n,3) 
R(n, 2) 
R(n,l) 



R(n,2 n ~ 3 - 1) R(n,2™~ 3 ) 
\R(n,2"- 3 ) R(n,2™- 3 



where 




R(n,2™- 3 -3) R(n,2"- 3 -2) 
1) R(n,2™- 3 -2) R(n,2"~ 3 -3) 



(38a) 




R(n, k) can be further arranged in the following form, 

R(n,fc) = 



/ u{n,k) v{n,k) v(n,k) v(n,k) 
v(n,k) u(n,k) v(n,k) v(n,k) 
v(n,k) v{n,k) u(n,k) v(n,k) 
\v(n,k) v(n,k) v(n,k) u(n,k) 



The two kinds of components u(n, k) and v(n, k) can be calculated by 

/ (J,(n,l) 




)ri-3 



H 



n-3\ 



(38b) 



(39) 



(40a) 



/ v(n, 1) 



ln-3 



H, 



\ v(n,2 



n-3\ 



( v{n,\) 



(40b) 



After squaring each component of VTW, the channel matrix P(j\i) can be obtained. According to the symmetry 
seen in Eq. (|38|), it is easy to see that the mutual information is given as, 



I n (S : n) = n - 1 + [u(n, kf \ogu{n, k) 2 + 3v(n, kf \ogv(n, kf] 



(41) 



fc=i 



In order to see the quantum gain, the difference between the mutual information per letter state I n {S : n)/n and 
Ci(s) is plotted as functions of n in Fig. |2| The dashed line corresponds to the case of n = 2 where the two codeword 

states {| + +), | )} are sent with the same prior probabilities and are detected by the optimum measurement 

minimizing the average error probability. In this case, the positive quantum gain was not found in the whole region 
of k. For n = 3 ~ 13 (solid lines), the difference becomes positive at the larger side of n. This positive gain clearly 
shows the superadditivity in capacity. Let k* be the value of k(< 1) for which the difference becomes zero. Then 
for < k < 1 the difference is always positive, and as n increases, decreases so that the positive gain appears in 
wider region of k. This relation is plotted in Fig. ||. The circles represent the points (K*,n). The solid line is just 
a guide for eye. The dashed line corresponds to the curve of n = 2/t -15 . This figure may provide a rough estimate 
for n in order to obtain the positive gain. That is, for a given k, one may guess that the superadditivity will appear 
when the order of extension for our [[n,n — 1, 2]] code is taken as an integer n larger than 2k -15 . Unfortunately we 
did not succeed in giving more rigorous condition. The maximum amount of the positive gain is still quantitatively 
unsatisfactory compared with the gap between the intrinsic capacity C(s) and the first order capacity Ci(s). Actually 
it is less than 10% of the maximum gap. In the case of n — 9 for which the maximum gain was obtained, C(s), 
Ig(S : fi)/9 and Ci(s) versus k are plotted in Fig. |[ 
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As far as the minimum average error probability is concerned, the square-root measurement we used (Eq. (E7h) ^ s 



the optimum for our \\n, n — 1, 2]] code (Eq. (£6|)) because all of the diagonal components of VfW are equal to u(n, 1) 
as seen from Eqs. ( p&j) and (|3J]), for which Theorem 1 holds. The minimum average error probabilities versus K are 
plotted for n = 3, 5, 7, 9, 11, 13 by the solid lines in Fig. ||. For a fixed k, they increase as with n. The dotted lines 
represent the minimum average error probabilities corresponding to the threshold points, that is, Eq. (|2l]). Although 
the error probabilities of our code [[n, n — 1,2]] are smaller than those of the threshold points, they are still larger than 
p = (1 — \1 — k 2 )/2 (the minimum error of the initial channel) at larger side of a. In spite of this, the quantum gain 
I n (S : [i)jn > Ci(s) reveals itself within such regions. 

The same tendency could be seen in other codes. Let us consider the so-called simplex code [[2 r — 1, r, 2 r ~ 1 ]] , for 
example. All the codewords are the same distance apart. Let n = 2 r — 1 and M = 2 r . Suppose that M-ary codewords 
are used with equal prior probabilities. Then the square-root measurement is again the optimum collective decoding 
for them. Defining, 



Q 



(n) = yJ\ + {M -1)k m / 2 , (42a) 

f3(n) = y/l ~ k m / 2 , (42b) 

it is straightforward to see, 

(VrM)u = u{n) = -1 \a(n) + (m - l)/?(n)] , (43a) 

(VrM)ij =v(n) = jj[a(n)-f3(n)], i^j. (43b) 

The mutual information is then given by, 

I n (S : fi) = log M + u{n) 2 log u(nf + (M - l)v{nf logu(n) 2 . (44) 

This code is compared with the previous one at n — 7 in terms of both the mutual information per letter states 
and the minimum average error probability in Fig. [j] and [t], respectively. The [[7,3,4]] simplex code has higher 
distinguishability of the codewords than the [[7, 6, 2]] code so that the minimum average error is much smaller, while 
its mutual information is not necessarily larger than the latter. The former overcomes the latter only in the region 
0.82 < k < 1. Around this region, the minimum average error probability is, again, larger than the one in the initial 
channel p. 

This tendency may be understood as a result from the facts that in order to produce the quantum gain a quantum 
interference among the codeword states must occur to reduce certain components of the channel matrix, and that such 
a quantum interference occurs more drastically when the nonorthogonality of the codeword states is larger, hence in 
larger side of n. On the other hand, the nonorthogonality causes the certain amount of decoding error as well. This 
decreases the amount of transmittable information, while the quantum gain may appear if the quantum interference 
reduces certain components of the channel matrix in a proper manner. Thus at shorter block length, the quantum 
gain as the difference I n — C\ > is likely to appear in larger side of k being accompanied by a certain amount of 
decoding error. A rough guide in order to construct a channel attaining the quantum gain for a given n, is as follows; 
the ratio of the number of the message bits k to the block length n should be taken larger than Ci(s) at this k first, 
and then codeword states should be selected being with distance as equally apart as possible. As K becomes closer 
to the unity, it becomes more effective in obtaining the quantum gain to take the ratio k/n small and to select the 
codeword states being distant. As an example, the simplex code [[7,3,4]] is compared with the code [[3,2,2]] in the 
region 0.75 < n < 1 in Fig. [|. Around at K = 0.7, the [[7,3,4]] code (solid line) is more efficient in terms of the 
mutual information than the [[3, 2, 2]] code (dotted line). For constructing codes such that the decoding error can be 
as small as possible and the rate can reach the Holevo bound, a larger block length at which the typical subspace 
can be well defined is necessary. Practical methods for obtaining larger quantum gain must be studied in great detail 
along this direction. 

V. CONCLUDING REMARKS 

The initial channel considered in this paper is the binary symmetric pure-state channel which is the simplest 
quantum channel. When the binary letter states are orthogonal, the channel is error free, and there is no quantum 
regime, that is, n-th order capacity obviously satisfies the strict additivity C n = nC\. When they are nonorthogonal, 
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i.e. < k < 1, the strict superadditivity C n > nCi, in turn, reveals itself. What we have shown in this paper is a 
demonstration of /„ > nC\ that ensures the strict superadditivity. 

The first part of this paper was devoted to the optimum collective decoding of the codeword states at the minimum 
average error probability. The scheme we proposed consists of the unitary transformation and the separate mea- 
surement. The unitary transformation generates appropriate superposition states among the codeword states such 
that the minimum average error is attained at the separate measurement. These output states are not separable into 
the letter states any more. Minimization of decoding error is just a manifestation of the optimum use of quantum 
interference associated with this kind of superposition states. The required unitary transformation is essentially a 
conditional dynamics in a higher dimensional Hilbert space, and is realized by a quantum circuit which is capable of 
manipulating each letter state in a conditional manner depending on the other letter states. Thus our scheme suggests 
a state-of-the-art quantum decoder structure. 

Quantum channels involving the above collective decoding has a memory effect, i.e. P(yij/2 ■ ■ ■ Dn\xiX2 • • • x n ) 7^ 



j J P{y%\xi). This inseparability of quantum channel is a direct origin of the quantum gain I n > nC\. Only when 

»=i 

codeword states are selected suitably, this inseparability leads to the quantum gain. We gave a heuristic approach 
to attain this gain for a given letter-state ensemble. Our examples are always accompanied by a larger amount 
of decoding error than the minimum average error in the initial channel. As mention in the previous section, this 
dilemma is because both decoding error and the quantum gain are originated from the nonorthogonality of the letter 
states. 

Although some basic aspects for realizing the quantum gain were clarified by this paper, practical codes that transmit 
classical alphabet faithfully at the maximum rate are still completely unknown. Even in classical information theory, 
realization of such codes that achieve asymptotically error free transmission at the rate C\ is very difficult. It might 
be an interesting problem to consider an application of some conventional error correcting codes to the n-product 
quantum channels described in this paper. This will lead to realization of asymptotically error free transmission at the 
rate I n /n (> C\). For the ultimate quantum channel coding, typicality of the Hilbert space spanned by the codeword 
states and sophisticated quantum error-correction might be considered together. 
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APPENDIX A: PROOF OF THE THEOREM 2 



We first prove the following lemma. 
Lemma 



sup 
n 



7(£ (1) ®£ (2) ,s«®s( 2 ) :f[) 



sup s« : tt«) + sup I{^ 2 \ s< 2 > : *< 2 >). 



(Al) 



Proof of lemma 



1) Clearly, 



sup I(i^®i {2 \s^®s^ :ti) 



n 



> sup J(£«®|( 2 \s«®s( 2 ) :IT) 




(A2) 



2) Representing P(j\i 1 ,i 2 ) = Tr^sj^ <g> s\ 2) ), where EL, is a POM on H ( s ] ® K 
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/tt^B^VsaW : n) 

= EEW»i.)tois 



{ log[ 
+ log [ 



E fcl ^>(j>i,i 2 ) 



} 



E4 23 {EE4^(iK^ 2 )io g 



££4 



,(2) 

J »2 L 11 



log{- 









V £ (2) 

Z^fc 2 >&2 


E fcl ^>(j|*i,M' 



(A3) 



We introduce two kind of POM on W; 1 ' and Hi 2 '' as, 



,(2) 



ng M ^ Tr( 2 ) (V« 



(A4) 
(A5) 



where Tr^- 1 means taking the trace over the space Hi , and is the identity operators in 7ii . Representing the 



conditional probabilities in Eq. (A3) as, 

P(j\h,i 2 ) =TrM (fl^sU 



(A6) 
(A7) 



we see that Eq. (A3) is equivalent to the following, 

/(£ (1) ®£ (2) ,§ (1) ®s (2) :H) 

= E4 2) ^ (1) -§ (1) :n£)) + ^ (2) ,§ (2) :n( 2) )- 



(A8) 



Hence 



sup/(£ (1) ®£ (2) ,s«®s^ : fl) 
n 

£<f sup/(^,sW :ri«)+sup/(^,s( 2 ) :II (2) ) 



n 

< sup :tt (1) ) + sup J(€W,gW 

7T (1) 7T (2 > 



n 

;(2) a(2) . »(2h 



(A9) 



The two inequalities (A.2) and (A9) prove the lemma. 

Now suppose that §W : 7r^) is maximized when = and iv^ = . Then the lemma means that 



sup/(^®^ 2 ),s«®s( 2 ) :ft) 



n 



C 1 (s( 1 )) + C 1 (s( 2 )). 



(A10) 



On the other hand, 
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= Ci(sW) + Ci(S< a >). (All) 
These two equations prove the theorem. □ 

APPENDIX B: PROOF OF THE THEOREM 3 

The necessary and sufficient condition that {^h} is the optimum POM are 

i) TTj(Wj - W k )TT k = 0, V(j, k), 

ii) v — Wj > 0, Vj, 

where Wi = £jS, and v = ^^WjTTj are the risk operators and the Lagrange operator, respectively. We would like to 

3 

prove that II ,...., (— %j 1 <g ■ ■ ■ <g ftj,) satisfy 

i) ' Il Jl ... j ,(W h ... Jl -W kl ... k ,)fl kl ... kl =0, 

ii) ' T — Wj 1 ...j l > 0, 

where Wjv-ii = Wj 1 <g> • • • <8> Wj, and Y = M /r :)1 ... :) ,n jl ...j 1 = t; 121 ™. Here note that the following formulas: 

h—ii 



A x (g) ■ ■ ■ (g> Ai - B x (g> ■ ■ ■ <g> Bi 

(Bl) 



(A-l - Bi) A 2 O • • • ® At 



B 1 (g{A 2 - B 2 )g)---® Ai + 
Bi®B 2 ®---® (Bi-Ai). 



Then to ensure i)', we rearrange the left-hand side as 



^-h-jiiWji-h - w kl ... kl )n kl ... kl 

TTj t W n TT kl (g) • • • (g) 7Tj n Wj n 7Tfc„ (B2) 
Ttj^^TT^ g) • • • g) Ttj n Wk n Ttk n , 



and put A m = TTj m Wj m 7r km and £? m = ^j m w km Tr km . Since i) is equivalent to A m — B m = 0, after Eq. (p3l|) is applied 



to Eq. (B2) we obtain i)'. Similarly, to show ii)', we put A rn — v and B m — Wj m . From the definitions, A m > and 
B m > 0. In addition, ii) is nothing but A m > B m . So when T — Wj 1 ...j l is decomposed by Eq. (Bl), its nonnegative 



definiteness is obvious. □ 
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jkd, K . K ato, M . Izutsu, and . H tota 

FIG. 1. The channel model obtained by decomposing the collective decoding into the unitary transformation and the separate 
measurement in the case of n = 3 and M = 4. 
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Fig 2 

of 'Quantum channels showing superadditivity in capacity' 
by M. Sasaki, K. Kato, M. Izutsu, and O. Hirota 

FIG. 2. The difference between the mutual information per letter state I n (S : and Ci(s) for n — 2 ~ 13 

. The dashed line corresponds to the case of n = 2, while the solid lines correspond to the case of n = 3 ~ 13. 
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Fig 3 

of Q uantum channels show ing superadditivity in capacity ' 
byM . Sasaki, K .KatD,M . lzutsu,andO .Hicota 

FIG. 3. The relation between n and «» (< 1) at which I„(S : y)/n — Ci(s) holds. ac*'s are denoted by the circles. The 
line is just a guide for eye. The dashed line corresponding to the curve of n = 2k~ 1,5 . 



19 




0.2 0.4 0.6 0.8 1 
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Fig4 

of "Quantum channels showing superadditivity in capacity' 
by M.Sasaki, K.Kato, M.Izutsu and O.Hirota. 

FIG. 4. C(s), 19(5" : fi)/9 and Ci(s) as functions of k. 
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Fig 5 

of 'Quantum channels showing superadditivity in capacity' 
by M. Sasaki, K. Kato, M. Izutsu, and O. Hirota 

FIG. 5. The minimum average error probabilities corresponding to the code [[n,n — 1,2]] (solid lines), the initial channel, 
i.e., p (dashed line), and the threshold points (dotted lines) as functions of k. 
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Fig 6 

of Q uantnm channels show ing superadd±i/±y in capaciy ' 
byM . Sasaki, K .Kato,M . Jzntsa, and . H iota 

FIG. 6. C(s), JV(S : A»)/7 for both of the [[7,3,4]] simplex code and the [[7,6,2]] code, and Ci(s) as functions of n. 
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Fig 1 

of Q uantum channels show ohg supemdd±i/±y ah capaciy ' 
byM . Sasaki, K .Kato,M . Izutsu, and .H iota 

FIG. 7. The minimum average error probabilities corresponding to the [[7,3,4]] simplex code, the [[7,6,2]] code , and the 
initial channel, i.e., p (dashed line) as functions of k. 
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Fig 8 

of Q uantum channels show ing superadd±i/±y in capaciy ' 
byM . Sasaki, K .Kato,M . i^utsu, and . H iota 

FIG. 8. The mutual informations per letter corresponding to the [[7,3,4]] simplex code (solid line) and the code [[3,2,2]] 
(dashed line) as functions of k. 
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